Reliable ABC model choice via random forests

نویسندگان

  • Pierre Pudlo
  • Jean-Michel Marin
  • Arnaud Estoup
  • Jean-Marie Cornuet
  • Mathieu Gautier
  • Christian P. Robert
چکیده

MOTIVATION Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. RESULTS We propose a novel approach based on a machine learning tool named random forests (RF) to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with RF and postponing the approximation of the posterior probability of the selected model for a second stage also relying on RF. Compared with earlier implementations of ABC model choice, the ABC RF approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least 50) and (iv) it includes an approximation of the posterior probability of the selected model. The call to RF will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. AVAILABILITY AND IMPLEMENTATION The proposed methodology is implemented in the R package abcrf available on the CRAN. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive ABC model choice and geometric summary statistics for hidden Gibbs random fields

Selecting between different dependency structures of hidden Markov random field can be very challenging, due to the intractable normalizing constant in the likelihood. We answer this question with approximate Bayesian computation (ABC) which provides a model choice method in the Bayesian paradigm. This comes after the work of Grelaud et al. (2009) who exhibited sufficient statistics on directly...

متن کامل

Bayesian Model Choice using Coupled ABC

In Neal (2010), a novel Approximate Bayesian Computation (ABC) algorithm, coupled ABC, was introduced. This paper shows how coupled ABC can be used in an efficient manner for model choice in a Bayesian framework. The methodology is applied to Gibbs random fields and stochastic epidemic models. Furthermore a very efficient simulation procedure for Gibbs random fields with a given sufficient summ...

متن کامل

Lack of confidence in approximate Bayesian computation model choice.

Approximate Bayesian computation (ABC) have become an essential tool for the analysis of complex stochastic models. Grelaud et al. [(2009) Bayesian Anal 3:427-442] advocated the use of ABC for model choice in the specific case of Gibbs random fields, relying on an intermodel sufficiency property to show that the approximation was legitimate. We implemented ABC model choice in a wide range of ph...

متن کامل

Lack of confidence in ABC model choice

Approximate Bayesian computation (ABC) have become an essential tool for the analysis of complex stochastic models. Grelaud et al. (2009, Bayesian Ana 3:427–442) advocated the use of ABC for model choice in the specific case of Gibbs random fields, relying on a inter-model sufficiency property to show that the approximation was legitimate. We implemented ABC model choice in a wide range of phyl...

متن کامل

ABC methods for model choice in Gibbs random fields

Gibbs random fields are polymorphous statistical models that can be used to analyse different types of dependence, in particular for spatially correlated data. However, when those models are faced with the challenge of selecting a dependence structure from many, the use of standard model choice methods is hampered by the unavailability of the normalising constant in the Gibbs likelihood. In par...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 32 6  شماره 

صفحات  -

تاریخ انتشار 2016